77 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Not Applicable
Contextualsed word embeddings,
Language Type:
Monolingual
Languages:
Ancient Arabic Basque Bokmål Bulgarian Catalan Chinese Church Croatian Czech Danish Dutch English Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Indonesian Irish Italian Japanese Korean Latin Latvian Norwegian Nynorsk Old Persian Polish Portuguese Romanian Russian Simplified Chinese Slavonic Slovak Slovene Spanish Swedish Turkish Ukrainian Urdu Uyghur Vietnamese
Availability:
Freely Available
License:
none
Size:
18.4 GByte Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Treebank Embedding Vectors for Out-of-domain Dependency Parsing
-
Paper track:Short/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joachim Wagner | Elmo For Many Languages | /N |
Documentation:
https://www.aclweb.org/anthology/K18-2005/
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Hungarian
Availability:
From Owner
License:
Size:
None Production Status:
Existing-used
Use:
ASR rich transcription
-
Paper title:Leveraging a character, word and prosody triplet for an ASR error robust and agglutination friendly punctuation approach
-
Paper track:10.5 Rich transcription/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | György Szaszák | Hungariasn Broadcast News Database | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Hungarian
Availability:
From Data Center(s)
License:
ELRA
Size:
None Production Status:
Existing-used
Use:
ASR rich transcription
-
Paper title:Leveraging a character, word and prosody triplet for an ASR error robust and agglutination friendly punctuation approach
-
Paper track:10.5 Rich transcription/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | György Szaszák | BABEL Hungarian Speech Databases | /N |
Documentation:
English and Hungarian
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Basque Belgian Dutch Croatian Czech Galician Greek Hungarian Portuguese Slovak Slovenian Spanish
Availability:
From Owner
License:
Size:
None Production Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs
-
Paper track:5.4 Speech and audio segmentation/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Lukas Mateju | COST278 database | /N |
Documentation:
None
Written
Lexicon,
Language Type:
Multilingual
Languages:
'Auhelawa Abau Aceh Achang Acholi Achuar-Shiwiar Aché Adamawa Fulfulde Adele Adhola Adi Adioukrou Aekyom Afrikaans Agarabi Aguacateco Aguaruna Agusan Manobo Agutaynen Aimol Ajië Ajyíninka Apurucayali Akawaio Akeu Akha Akoose Alamblak Alangan Alekano Algonquin Alladian Alune Alur Ama Amanab Amarakaeri Amarasi Ambai Ambulas Amele Amganad Ifugao Amharic Amri Ancient Greek Aneme Wake Angaatiha Angal Heneng Angami Naga Angor Anjam Anufo Ao Naga Apalaí Apatani Apinayé Apurinã Arabela Arifama-Miniafia Armenian Arop-Lukep Arosi Aruamu Asháninka Ashéninka Pajonal Assyrian Neo-Aramaic Ata Manobo Atatláhuca Mixtec Au Aukan Avar Avokaya Awa Awa-Cuaiquer Awadhi Awiyaana Ayoreo Ayutla Mixtec Azerbaijani Baatonum Baba Malay Bafia Bafut Bahasa Melayu Baka Bakairí Balangao Balantak Bali Bamanankan Bana Bandial Banggai Baoulé Barai Barasana Bargam Bariai Baruya Bashkir Basque Bassari Batad Ifugao Batak Angkola Batak Dairi Batak Karo Batak Simalungun Batak Toba Bauzi Bavarian Bawm Chin Bedjond Beembe Bekwarra Belarusan Belize Kriol English Bemba Bembe Benabena Bengali Berom Bete-Bendi Biangai Biatah Biete Bima Bimin Bimoba Bine Binukid Binumarien Bislama Bissa Bisu Bokmal Norwegian Boko Bokobaru Bola Bomu Bora Border Kuna Borong Bribri Buamu Bugawac Bugis Buhid Bukiyip Bulgarian Buli Bulu Bumbita Arapesh Bunama Burarra Burmese Burum-Mindik Busa Cabécar Cacua Caluyanun Cameroon Mambila Camsá Candoshi-Shapra Capanahua Caquinte Car Nicobarese Carapana Carib Caribbean Hindustani Caribbean Javanese Carrier Cashibo-Cacataibo Cashinahua Casiguran Dumagat Agta Catalan-Valencian-Balear Cavineña Cebuano Central Aymara Central Bicolano Central Bontok Central Cagayan Agta Central Cakchiquel Central Dusun Central Huasteca Nahuatl Central Khmer Central Kurdish Central Mnong Central Yupik Central-Eastern Niger Fulfulde Cerma Chachi Chamacoco Chamorro Chang Naga Chavacano Chayahuita Chayuco Mixtec Chechen Cherokee Chhattisgarhi Chipaya Chiquihuitlán Mazatec Chiquitano Chiripá Chopi Chortí Chothe Naga Chuave Chumburung Chuukese Chuvash Chácobo Cishingini Coatlán Mixe Coatzospan Mixtec Cofán Cogui Colorado Comaltepec Chinantec Coptic Cornish Cotabato Manobo Croatian Cubeo Cuiba Culina Czech Dadibi Daga Dan Dangaléat Danish Dano Dawawa Dawro Dedua Deg Denya Desano Dhao Dibabawon Manobo Digo Dii Dimasa Djambarrpuyngu Djimini Senoufo Dobu Dogrib Doyayo Dupaninan Agta Duri Duruma Dutch East Kewa Eastern Bolivian Guaraní Eastern Bontok Eastern Bru Eastern Canadian Inuktitut Eastern Highland Chatino Eastern Huasteca Nahuatl Eastern Jacalteco Eastern Kanjobal Eastern Krahn Eastern Mari Eastern Oromo Eastern Tawbuid Efik Ejagham Ekajuk El Nayar Cora Endo English Enxet Erzya Ese Ese Ejja Esperanto Estonian Ewage-Notu Ezaa Faiwol Falam Chin Farefare Faroese Fasu Fijian Fijian Hindustani Filipino Finnish Fon Fore French Ga Ga'dang Gagauz Galela Galo Adi Gamo Ganda Gangte Garifuna Garo Gbagyi Gen Georgian Gheg Albanian Ghomálá' Gidar Gikuyu Gikyode Girawa Gofa Gogo Gokana Golin Gonja Gor Gorontalo Gourmanchéma Greek Guahibo Guajajára Guambiano Guanano Guarayu Guayabero Gude Guerrero Amuzgo Guerrero Nahuatl Guhu-Samane Guinea Kpelle Gujarati Gulay Gumatj Gumuz Gun Gusii Gwahatike Gwich'in Haitian Creole French Haka Chin Hakka Chinese Halh Mongolian Halia Hamer-Banna Hanga Hanunoo Hausa Hawai'i Creole English Haya Hebrew Hehe Helong Highland Oaxaca Chontal Highland Puebla Nahuatl Hiligaynon Hindi Hiri Motu Hixkaryána Hmong Daw Hmong Njua Hopi Hote Hrangkhol Huambisa Huautla Mazatec Huichol Huli Hungarian Iatmul Iban Ibatan Icelandic Igbo Ignaciano Ika Ikwere Ikwo Ilianen Manobo Ilocano Imbongu Inabaknon Indonesian Inga Inoke-Yate Iraqw Iraya Iriga Bicolano Irigwe Irish Gaelic Islander Creole English Isnag Isthmus Mixe Isthmus-Mecayapan Nahuatl Italian Itawit Iu Mien Ivatan Ivbie North-Okpela-Arhe Iwal Iyo Iyo'wujwa Chorote Iyojwa'ja Chorote Izere Izii Jalapa de Díaz Mazatec Jamaican Creole English Jamiltepec Mixtec Japanese Jarai Javanese Jingpho Jola-Fonyi Jola-Kasa Jukun Takum Jula Juquila Mixe Jur Modo Kabiyé Kabyle Kadiwéu Kafa Kagayanen Kagulu Kahua Kaingáng Kaiwá Kako Kalagan Kalam Kalanga Kamano Kamasau Kambaata Kamwe Kandawo Kanite Kankanaey Kannada Kapingamarangi Kara Karachay-Balkar Karajá Karakalpak Karamojong Karbi Kasua Kayabí Kazakh Keapara Kein Kekchí Kele Keley-I Kallahan Kenga Kenyang Keyagana Khakas Khiamniungan Naga Kim Kimré Kinaray-A Kire Kirghiz Kiribati Kisar Kituba Kobon Kom Komba Komi-Zyrian Komso Konai Konni Kono Konyak Naga Koongo Koorete Korafe Korean Koreguaje Koronadal Blaan Kosena Kouya Koya Krio Kuanua Kube Kukele Kuku-Yalanji Kumam Kuman Kumyk Kunimaipa Kuot Kupang Malay Kupsabiny Kuranko Kusaal Kutep Kutu Kuwaa Kuwaataay Kwaio Kwanga Kwanyama Kwara'ae Kwere Kwoma Kyaka Laari Lacandon Ladakhi Lahu Lahu Shi Lalana Chinantec Lama Lamba Lambya Lamkang Lampung Lango Lao Lashi Latin Latvian Lealao Chinantec Ledo Kaili Lega-Mwenga Lelemi Lengua Lenje Lewo Lhomi Liangmai Naga Limbu Limbum Limos Kalinga Lingala Literary Chinese Lithuanian Lobi Loma Low Saxon Lozi Luang Lukpa Luo Luwo Lyélé Ma'anyan Ma'di Maasina Fulfulde Mabaan Maca Macedonian Machame Machiguenga Macuna Macushi Mada Madak Madura Mafa Maithili Maiwa Makaa Makasar Makonde Malayalam Malba Birifor Male Maltese Mamanwa Mamara Senoufo Mamasa Mampruli Manam Mandinka Mangga Buang Manggarai Mangseng Manikion Mankanya Mansaka Maori Mape Mapos Buang Mapudungun Maram Naga Maranao Marathi Marba Marik Maring Naga Marshallese Maru Masaba Masana Masbatenyo Maskelynes Matal Matigsalug Manobo Matsés Mauwake Maxakalí Mayo Mayoyao Ifugao Mazahua Central Mazatlán Mixe Mbay Mbo-Ung Mbuko Mbula Mbunda Mbyá Guaraní Mekeo Melpa Mende Mengen Mentawai Merey Meyah Mian Michoacán Nahuatl Micmac Middle English Min Nan Chinese Minangkabau Minaveha Minica Huitoto Mizo Moba Mocoví Mofu-Gudur Mokole Molima Mong Leng Mong Njua Mongo-Nkundu Mongondow Mono Moose Cree Mopán Maya Morisyen Moro Moskona Motu Mountain Koiali Moyon Naga Mufian Muinane Mumuye Muna Mundang Mundani Mundurukú Murle Murui Huitoto Musey Muyang Mískito Mòoré Mün Chin Mündü Naasioi Nabak Nadëb Nafaanra Nakanai Nalca Nama Nande Nandi Naro Navajo Nawdm Ndamba Ndau Ndebele Ndo Ndogo Ndonga Nepali Nga La Ngaju Ngangam Ngawn Chin Ngiemboon Ngindo Ngiti Ngombe Ngulu Ngäbere Nias Nigeria Mambila Nigerian Fulfulde Nii Nilamba Ninzo Nivaclé Nkonya Nobonob Nocte Naga Nogai Nomaande Nomatsiguenga Noone Nopala Chatino North Alaskan Inupiatun North Mofu Northeastern Dinka Northern Dagara Northern Emberá Northern Grebo Northern Khmer Northern Kissi Northern Kurdish Northern Mam Northern Oaxaca Nahuatl Northern Puebla Nahuatl Northwest Alaska Inupiatun Northwest Gbaya Ntcham Numanggang Nyindrou Nyishi Nynorsk Norwegian Obolo Ocotepec Mixtec Ogea Old Church Slavonic Olusamia Ozumacín Chinantec Palantla Chinantec Pamplona Atta Paraguayan Guarani Patpatar Pele-Ata Peñoles Mixtec Phom Naga Pichis Ashéninka Pinotepa Nacional Mixtec Plapo Krumen Psikye Pular Qaqet Quiotepec Chinantec Rabinal Achí Russia Buriat Rwanda S'gaw Karen Sa'a Saamia Sabu Safeyoka Saint Lucian Creole French Samba Leko San Blas Kuna San Jerónimo Tecóatl Mazatec San Juan Colorado Mixtec San Juan Cotzal Ixil San Mateo del Mar Huave San Miguel el Grande Mixtec San Pedro Amuzgos Amuzgo San Sebastián Coatán Chuj Santa María Zacatepec Mixtec Santa Teresa Cora Sar Sarangani Blaan Sarangani Manobo Sateré-Mawé Sea Island Creole English Sekpele Sepik Iwam Seselwa Creole French Sharanahua Shuar Silacayoapan Mixtec Siyin Chin Sochiapan Chinantec South Fali South Giziga Southern Altai Southern Birifor Southern Bobo Madaré Southern Carrier Southern Ghale Southern Kalinga Southern Kisi Southern Nambikuára Southern Nuni Southern Puebla Mixtec Southwest Gbaya Southwestern Dinka Standard Arabic Standard German Tabasco Chontal Tabo Tagabawa Takuu Tangkhul Naga Tataltepec Chatino Tedim Chin Tenango Nahuatl Tepetotutla Chinantec Tepeuxila Cuicatec Tetelcingo Nahuatl Teutila Cuicatec Tezoatlán Mixtec Tlahuitoltepec Mixe Tol Toro So Dogon Totontepec Mixe Toura Tsikimba Tsimané Tuma-Irumu Tumbalá Chol Tungag Tuwali Ifugao Uab Meto Ucayali-Yurúa Ashéninka Umanakaina Umiray Dumaget Agta Una Usila Chinantec Vengo Veracruz Huastec Waimaha Wancho Naga Wandala Waorani Wayuu Welsh West Kewa West-Central Limba Western Apache Western Arrarnta Western Bolivian Guaraní Western Bukidnon Manobo Western Frisian Western Highland Chatino Western Huasteca Nahuatl Western Kanjobal Western Niger Fulfulde Wichí Lhamtés Güisnay Wichí Lhamtés Nocten Wipi Woun Meu Xaasongaxango Yabem Yanesha' Yocoboué Dida Yosondúa Mixtec Zaiwa Zarma Zemba Zotung Chin Zulgo-Gemzek Éwé Ömie
Availability:
Freely Available
License:
CC BY-NC-ND license (Attribution-NonCommercial-NoDerivs)
Size:
7 MByte Production Status:
Newly created-finished
Use:
Opinion Mining/Sentiment Analysis
-
Paper title:UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages
-
Paper track:Terminology/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ehsaneddin Asgari | UniSent | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Basque English Finnish French Hungarian Romanian
Availability:
Freely Available
License:
MIT License
Size:
8130 sentences Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
-
Paper track:Speech/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Marcely Zanon Boito | MaSS dataset | /N |
Documentation:
Documentation in English at the github page
Multimodal/Multimedia
Educational materials and knowledge dissemination,
Language Type:
Multilingual
Languages:
Dutch English German Hungarian Polish
Availability:
Freely Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Newly created-in progress
Use:
Educational materials and knowledge dissemination
-
Paper title:Languagesindanger.eu - including multimedia language resources to disseminate knowledge and create educational material on less‑resourced languages
-
Paper track:Multimodality
-
Paper status:Accept Poster+DemoSuggested
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Dagmar Jung | University of Cologne | DE |
| Author 2 | KATARZYNA KLESSA | The Institute of Linguistics, Adam Mickiewicz University in Poznan, Poland | PL |
| Author 3 | Zsuzsa Duray | Research institute for Linguistics, Hungarian Academy of Sciences | HU |
| Author 4 | Beatrix Oszkó | Research institute for Linguistics, Hungarian Academy of Sciences | None |
| Author 5 | Mária Sipos | Research institute for Linguistics, Hungarian Academy of Sciences | HU |
| Author 6 | Sándor Szeverényi | Research institute for Linguistics, Hungarian Academy of Sciences | HU |
| Author 7 | Zsuzsa Várnai | Research institute for Linguistics, Hungarian Academy of Sciences | HU |
| Author 8 | Trilsbeek Paul | Max Planck Institute for Psycholinguistics, Nijmegen | NL |
| Author 9 | Tamás Váradi | Research institute for Linguistics, Hungarian Academy of Sciences | None |
| Main Contact | KATARZYNA KLESSA | The Institute of Linguistics, Adam Mickiewicz University in Poznan, Poland | None |
Documentation:
<Not Specified>
Written
Lexicon,
Language Type:
Multilingual
Languages:
English German Hungarian Iranian Persian
Availability:
Freely Available
License:
CreativeCommons
Size:
25 <Not Specified>Production Status:
Newly created-in progress
Use:
<Not Specified>
-
Paper title:Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies
-
Paper track:Written
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Thierry Declerck | DFKI GmbH | None |
| Author 2 | Karlheinz Mörth | Austrian Academy of Sciences | None |
| Author 3 | Piroska Lendvai | Hungarian Academy of Sciences | None |
| Main Contact | Thierry Declerck | DFKI GmbH | DE |
Documentation:
<Not Specified>Language Type:
Multilingual
Languages:
English Hungarian
Availability:
Freely Available
License:
Creative Commons
Size:
14000 sentences Production Status:
Newly created-finished
Use:
Information Extraction, Information Retrieval
-
Paper title:Light Verb Constructions in the SzegedParalellFX English--Hungarian Parallel Corpus
-
Paper track:Written
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country | ||
|---|---|---|---|---|---|
| Author 1 | Veronika Vincze | University of Szeged | None | Hungarian Academy of Sciences | None |
| Main Contact | Veronika Vincze | University of Szeged | HU | MTA-SZTE Research Group on Artificial Intelligence | HU |
Documentation:
guidelines in English and Hungarian




